Detecting adaptive changes in gene copy number distribution accompanying the human out-of-Africa expansion

Genes with multiple copies are likely to be maintained by stabilizing selection, which puts a bound to unlimited expansion of copy number. We designed a model in which copy number variation is generated by unequal recombination, which fits well with several genes surveyed in three human populations. Based on this theoretical model and computer simulations, we were interested in determining whether the gene copy number distribution in the derived European and Asian populations can be explained by a purely demographic scenario or whether shifts in the distribution are signatures of adaptation. Although the copy number distribution in most of the analyzed gene clusters can be explained by a bottleneck, such as in the out-of-Africa expansion of Homo sapiens 60–10 kyrs ago, we identified several candidate genes, such as AMY1A and PGA3, whose copy numbers are likely to differ among African, Asian, and European populations.


Introduction
Gene copy number variation (CNV) refers to the presence of multiple copies of a gene family within a genome, resulting from duplications, deletions, or rearrangements.Combined with their high mutation rate CNVs constitute a significant driver of genomic variability that allows for rapid adaptive evolution in response to environmental changes (Sudmant et al. 2015;Brahmachary et al. 2014;Carvalho and Lupski 2016;Iskow et al. 2012;Sebat et al. 2004).
A well studied example of CNV within human population is provided by the salivary amylase gene, whose variations in the number of copies are hypothesized to correlate with the extent of dietary starch consumption not only in human but also in other species (Pajic et al. 2019;Atkinson et al. 2018;Carpenter et al. 2015;Usher et al. 2015;Falchi et al. 2014;Perry et al. 2007).
In general, copy number variation may result from different evolutionary forces acting upon them.Demographic events, such as population migrations and expansions, can lead to changes in gene frequencies and distributions over time.Simultaneously, natural selection acts on genetic variations, favoring advantageous alleles and promoting their proliferation within populations.
It is known that both demographic effects and selection may produce similar patterns in single nucleotide as well as in structural variants, making it difficult to disentangle these forces (Lohmueller 2014;Stajich 2004).For SNP or allele frequency data, there have been well-developed statistics (e.g., (Tajima 1989;Fu 1997)) that are "standardized" so that a genomic baseline can be established, from which loci under selection may be detected.However, such a genomic baseline is not available for gene copy number variation data.Therefore, we resorted to a more basic approach involving modelling and computer simulations.
We have recently examined the evolutionary dynamics of multicopy gene families with respect to selective pressure and unequal recombination (Otto et al. 2022).This study focused on analyzing the impact of stabilizing selection on gene copy numbers, while considering the role of recombination as a randomizing mechanism that introduces variability within the population.
By expanding this model, we aimed to assess whether gene copy number alterations observed within human populations could be solely attributed to demographic events or whether selective pressures have played a role in shaping these variations.
In this study, we conducted extensive simulations under various scenarios of human demography and selective changes.By disentangling the effects of these two forces, we sought to gain a deeper understanding of the evolutionary processes driving gene copy number variation in human populations.Based on empirical data of human gene copy numbers we identified several candidate genes, whose copy numbers are likely to be selected differently among African, Asian and European populations.

Gene copy number variation in human
We started with the dataset provided by Brahmachary et al. (2014).Using Nanostring technology they estimated gene copy numbers of 180 gene-families in 165 individuals of three populations (60 African Yuroba -YRI, 60 Central Europe -CEU and 45 Asia -CHB) based on data collected in the framework of the 1,000 Genomes Project (Sudmant et al. 2010).
Adaptation in gene copy number distribution While some of these loci showed copy numbers of > 100 copies (DUX4 even up to 600), we focused on intermediate copy numbers and removed all satellite loci, genes on sex chromosomes, genes with minimum copy number below 2, and genes with mean copy number (in YRI) below 5 or above 60.For genes that have two primer sets, only one is taken.We used t-test and F-test statistics to select gene families with significant differences in mean or standard deviation between YRI-CHB or YRI-CEU comparisons and removed those that showed no statistical evidence in any of these.This resulted in 42 gene families, see Table 1 and

Unequal recombination model
In a recently developed model we considered unequal recombination and selection to describe the evolution of tandem gene arrays (Otto et al. 2022).We shortly summarize the main findings.Consider two chromosomes with gene arrays of size y 1 and y 2 .A recombination event happens at rate r and may produce a gamete of gene array size according to the trapezodial distribution, such that See Fig 2A for an illustration.We apply a fitness function, where each newly arising copy has a positive, yet decreasing benefit s x .This is motivated by assuming a beneficial effect, yet with diminishing returns, either of increased gene dosage or of increased allelic diversity within an individual (Otto et al. 2022).At the same time, we assume additional copies to be selected against with an increasing selective disadvantage s y .This is motivated by assuming an increasing cost of replication, of gene processing and of maintaining genome integrity.Both effects are cast in a doubleepistatic fitness function with two selection coefficients (s x , s y ), governed by a single epistasis parameter (ε).To avoid the trivial long-term evolution equilibrium of one copy, we assume s x > s y .Furthermore, we assume ε = 0.05 to be constant in the following.Summarizing, fitness of a diploid individual with total gene copy number y is given by This leads to an optimal copy number y opt of which is determined by the ratio s x /s y when ε is kept fixed.See Fig 2B for an example.The population is then simulated according to a Wright-Fisher model with non-overlapping generations and with selection and recombination described as above.It was shown, that in the deterministic model the equilibrium copy number distribution is centered around y opt and is well approximated by a Gamma distribution (Otto et al. 2022).Furthermore, the coefficient of variation C V = σ/ ȳ is correlated to the logarithm of the recombination -selection ratio log(r/s x ).With strong selection and low recombination the distribution is tightly distributed around the optimal value, whereas higher r and lower s x lead to a widespread distribution.For convenience, we introduce two new parameters: • q S = s x /s y , the 'selection ratio', which determines the optimal copy number, such that for ε = 0.05 we find • q R = r/s x , the 'recombination/selection ratio', which measures the impact of unequal recombination compared to the selective pressure of the fitness function and therefore determines the coefficient of variation C V = σ/ ȳ of the equilibrium distribution.

Regression
We aim to quantify the effect of (r, s x , s y ) on the resulting equilibrium copy number distribution and -vice versa -to estimate for given empirical data the underlying parameter triple.To analyze the equilibrium distribution of the unequal recombination process under drift, we simulated the population evolution under different parameter settings.The codes for all following simulations are available at https://github.com/y-zheng/Distinguishing-roles-adaptation-demography-gene-copy-number-changes-human-populations.
Population size is kept at N = 5, 000 and assumed to be at an initial state of 5 copies on each chromosome.The different input parameters are given in Table 2. Together, they define 324 triples r, s x , s y .Additionally, we generated 160 random pairs such that q R is between 0.01 and 5 and y opt is between 4 and 60 and combined them with the four recombination rates, leading to a total parameter set of 964 combinations, where we disregarded those triples with selective strengths s x > 0.1 to keep a realistic parameter range.
For each of this parameter combinations, we evolve the population under the given selection scheme for 5 million generations.The first 200,000 generations were discarded as burn-in and the population statistics (mean copy number ȳ and standard deviation σ) are recorded every 20,000 generations.
In total, this results in ≈ 185, 000 data points, which we used to determine the relationship of input parameters (r, s x , s y ) and output population statistics ( ȳ, σ).
As indicated in (Otto et al. 2022) we suggest a mean copy number ȳ close to its optimal value y opt and a correlation of C V to log(q R ).Indeed, with r 2 -values of 0.9842 and 0.9088 we find We calculated the q S and q R ratios based on ȳ and C V from gene copy numbers (see Table 1) using the regression formula (3) with four recombination rates r = 0.001, 0.002, 0.005 and 0.01.Results for the four candidate genes shown in Fig 1 are given in Table 3.

Demography simulations
To determine whether significant changes of mean and variance of the copy number distribution (Table 1) can be explained by demographic history of human populations, we examined in total 6 different scenarios (enumerated as I -VI), see Fig 3.

Simulation of the bottleneck model
First, we ran a simple bottleneck model of three different population reductions.Each is divided into three phases: (1) Burn-in phase.For each gene we used the estimated (r, s x , s y )-triple based on the dataset from YRI.These parameters were chosen as input to produce an equilibrium population of N = 10, 000 by a burn-in process of 200,000 generations.'Independent' equilibrium populations are produced by recording the population state every 20,000 generations.(2) Bottleneck.From equilibrium we reduced the population size to N = 100, 500 or 1, 000, denoted scenario I, II and III, and kept it such for 5,000 generations.(3) Recovery phase.At the end of the bottleneck, the population is reset to N = 10, 000 and the copy number distribution is recorded every 50 generations until generation 1000 after the bottleneck.We ran the bottleneck simulations I -III on all gene families given in Table 1, with recombination rates r = 0.001, 0.002, 0.005 and 0.01, and discarded parameter combinations with s x outside the interval [0.001, 0.1] in YRI.This gives a final total of 42 gene families and 95 gene-r combinations.For each gene, recombination rate and bottleneck population size combinations, 10,000 replicates are produced (from 100 'independent' starting equilibria).
We then traced mean and C V along the recovery phase and compared these with the empirical data from CHB and CEU populations.

Simulation of the human population history
A more realistic population history of human is given by the Genetic Algorithm for Demographic Model Analysis (GADMA) (Noskova et al. 2020), which also includes migration between subpopulations.We ran simulations on four candidate genes (AMY1A, PGA3, SULT1A3, DEFA1) with the following modification of the GADMA-demography: As ancestral population (N = 9, 900 in GADMA), we used the equilibrium populations (N = 10, 000) from the previous section.Therefore, we started the simulation 5992 generations before present, roughly corresponding to the onset of the 'out-of-Africa' expansion, when the Eurasian population split from the ancestral population and experienced a sharp bottleneck.To reduce computation time, we did 4 recombination rates r 0.1%, 0.2%, 0.5% and 1% 9 recombination/selection ratios q R = r/s x 0.01, 0.02, 0.05, 0.1, 0.5, 1.0, 2.0, 5.0 9 optimal copy number values y opt 10,15,20,25,30,35,40,45,50  not simulate the continued evolution of the African (YRI) population, since we assumed it to be in equilibrium; for migration from YRI to Eurasian populations, we drew samples from the ancestral population.At 896 generations before present, CEU and CHB split from each other and started to evolve including reciprocal migration and exponentially increasing population size.In the following, we refer to this simulation as scenario IV.At 'present', copy number distributions (mean and variance) were recorded.For each gene and recombination rate combination, 10,000 replicates were produced.
We also ran the same population model with a change of the selection parameter either at 500 generations or 896 generations before present (the latter being the CEU/CHB split time).The new selection parameters (s x and s y ) are different for CEU and CHB populations, and are estimated from present CEU/CHB distributions (see Table 3).These simulations are hereafter called scenario V (selection change 500 generations before present) and VI (896 gpb).

Rejecting a purely demographic model
Observing the copy number distribution for a gene family in the ancestral (YRI) population, we seek to answer the question whether the observed distribution in the derived population (CEU or CHB) can be explained by a purely demographic model (various bottlenecks, but keeping selection pressure constant as modelled in scenarios I to IV) or not (demography plus change in selection pressure as modelled in scenarios V or VI).To decide this we use the following strategy.For each scenario I-VI and each parameter triple estimated from the YRI population, 10,000 replicates were produced.From each resulting equilibrium distribution, we record the mean ȳ and standard deviation σ.This results in a parameter distribution for each scenario.If a chosen empirical data set has mean ȳ or standard deviation σ which are not in the 95% quantile of the 10,000 simulated values, we conclude that this particular scenario is to be rejected as a possible explanation of the data.In particular, we reject a purely demographic explanation, if scenarios I-IV are rejected.

Results and Discussion
In this study, we conducted an analysis of multicopy gene family evolution using a model that incorporates unequal recombination and selection.Our investigation aimed to examine the copy number changes observed in subpopulations of Europe, Asia, and Africa and to determine whether these changes could be attributed to either constant selective pressure and demographic factors or an adaptive change of selection together with demography.Our findings reveal that the observed copy number variations in several genes cannot adequately be explained by demographic processes alone, suggesting a -possibly adaptive -change of selective pressure in the derived populations.
Based on the data of Brahmachary et al. (2014), we chose 42 gene families of intermediate copy number that show significant differences in their distribution among different populations (Table 1).Although the raw data rely on phase I of the 1000 Genomes project they proved to be most suitable for our analyses.More recent data, for instance from the human pangenome project (Liao et al. 2023), still lack sufficient coverage of the different subpopulations.
When comparing the copy number distributions of the 42 candidate genes in the Asian and European with the African population (assumed to be in equilibrium), we observe 61 significant changes Adaptation in gene copy number distribution in mean copy number and 29 significant changes in the variance (Table 1), of which only seven show a decrease of variance (one example is DEFA1).Within our model, assuming constant recombination rate among subpopulations, and no demographic changes, a decreased variance (or standard deviation) can only be achieved by an increase of positive selection (s x ), since C V is determined by q R = r/s x , see equation (3).However, the most common case is that of a consistent significant shift in the mean in both derived populations without affecting the variance, i.e. either (+ + | 00) or (--| 00), which occurs in 12 of the 42 analyzed genes.Only one gene (PGA3) showed opposite significant changes of the mean (increasing in Asia but decreasing in Europe).
To test whether a change of population size is sufficient to explain these significant differences in copy number statistics (shown in Table 1), we ran the simple bottleneck scenarios I-III with 95 parameter combinations (r, s x , s y ) based on the ancestral copy number distribution of YRI and the regression equation (3).As an example, for PGA3 and r = 0.001 we ran simulations with selection coefficients s x = 0.0122 and s y = 0.0066, see Table 3. Fig 4 shows mean gene copy number of 10,000 simulated bottleneck populations over time for each recombination strength (r = 0.001, 0.002 and 0.005).Note that for this gene the value r = 0.01 was neglected, since the selection strength s x would exceed the threshold of 0.1.Gray boxes indicate the centered 50% quantile, white the 95% and whiskers the 99% quantiles.With strong bottleneck (reduction to N = 100 for 5,000 generations) and under low recombination and, hence, weak selection (r = 0.001, and q R = r/s x , q S = s x /s y constant) we find the widest variation among the 10,000 replicates.Higher r and stronger selection result in a mean value close to the one of YRI population, i.e. the value we would expect from the initial parameter estimation.
In this particular example, the empirical data show a signifcantly higher mean copy number of PGA3 in CHB (red line) and a lower mean value in CEU (blue line) compared to YRI (black horizontal line).It is the only gene in our set, that shows significant shifts of mean copy number in opposite directions.Only under a strong bottleneck and with low recombination these changes could be explained without invoking a change of selection intensity.
The results for all 95 parameter combinations obtained for scenario I (strong bottleneck) are summarized in Table 4.To test whether observed means or standard deviations can be explained by scenario I, we considered the time point after 1,000 generations of recovery (first row of Fig 4, last boxplot in each panel).If the mean or resp.C V lies within the 95% quantile, we indicate nonsignificant differences with a 0. Significant changes are marked with a single * (α = 5%) or double asterisk * * (α = 1%).Taking again PGA3 as example, we find a mean value which is significantly smaller in CEU than in YRI (marked with -).With r = 0.001, this Table 4 Results of bottleneck simulations.We ran simulations of scenario I (the strongest bottleneck with a reduction to N = 100) with parameters (r, s x , s y ) estimated from the YRI-data and tested, whether after 1000 generations of recovery the mean and standard deviation σ of the CEU and CHB-data can be explained by a bottleneck.Blank space indicates that this parameter combination led to an s x value out of the range of 0.001 and 0.1, and hence no simulation was run.The columns with 0,+ and -indicate whether there is a significant difference to the empirical data set (see Table 1).In the r1 -r10 columns, a 0 indicates that the data can be explained by a bottleneck.* and ** show significant differences (5% and 1%) of the simulated and empirical data.The four candidate genes which were used for further simulations are highlighted with a light gray background.might be explained by a bottleneck (denoted by 0), whereas for r = 0.002 and r = 0.005 we find a significant difference ( * * ) and the bottleneck explanation to be highly unlikely.Higher recombination r = 0.01 led to s x values greater than 0.1 in CHB and YRI (see Table 3) and hence was omitted.
If we consider a significant difference in the mean of CEU compared to YRI (28 genes; first column in Table 4 to be non-zero), we find that only 17 out of 65 simulated parameter combinations in scenario I can explain these differences.For significant mean changes in CHB (33 genes; Table 4), 22 out of 72 parameter combinations are compatible with the observation, while the remaining 50 can not explain the significant difference.For other examples, AMY1A and PGA3 both showed an increased mean value in CHB.In neither case, and for neither parameter combination, scenario I is sufficient to explain the observation.
From the candidates with a significant difference in mean or variance we selected the three genes coding for digestive enzymes, AMY1A, SULT1A3, PGA3, and the defense gene DEFA1 for a more detailed analysis and tested the GADMA model (Noskova et al. 2020) without and with selection change according to the estimates from regression (scenario IV-VI).
Fig 5 shows mean copy number and coefficient of variation C V at present, simulated according to scenarios I,IV,V and VI for 10,000 replicates each.As in scenarios I-III, the terminal values generated in scenario IV are close to the initial ones of the ancestral YRI data set.Therefore, even the more realistic GADMA migration model often fails to explain the data found in CEU and CHB when considering constant selection parameters derived from the ancestral YRI population.However, when selection strength is allowed to change, as in scenarios V or VI, a different picture emerges: Consider a change of s x and s y at either 500 or 896 generations before present with respect to the values estimated from equation (3), given in Table 3.Then, the simulations return mean and C V which are closer to the values found in CEU and CHB data.Indeed, the empirical data often lie within the 95%-or 99%-quantiles of the simulated data distributions.We observe no strong difference between the results of scenarios V and VI, suggesting that 500 generations represent a sufficiently large time span to reach a new equilibrium.
Hence, one possible explanation in the shift of copy number distribution of the four candidate genes is a change of selection pressure and adaptation.
The AMY1A gene, which encodes amylase, an enzyme that breaks down starch, has strongly increased mean and σ in the Asian pop-ulation, likely linked to adaptations to high grain intake.In the European population, while the variation is increased, the change in mean copy number is small.These findings are in agreement with results of several studies that indicate that individuals from populations with high-starch diets have, on average, more gene copies than those with traditionally low-starch diets (Perry et al. 2007;Pajic et al. 2019;Atkinson et al. 2018).Under our model selection strength is relaxed in CEU and CHB with a factor of 4, such that a higher copy number is not selected against and a more widespread distribution of CNV can evolve.A recent study (Inchley et al. 2016) has suggested a more complicated model of Amylase evolution, involving two steps: an expansion from one to several copies after the human-Neanderthal split, but before separation of modern human populations, and a subsequent shift of the optimal gene copy number, independently in different populations.This study also suggests that increase of AMY1 copy number occurred in South America even more dramatically than in East Asia, a hypothesis which should be tested in the framework of our model as soon as suitable data become available.
SULT1A3 is a gene in the SULT (sulfotransferases) family, which catalyze sulfation of a variety of substrates, especially catecholamines including dopamine and epinephrine (Brix et al. 1999;Dajani et al. 1999).Polymorphisms in SULT1A3 and SULT1A4 have been shown to affect metabolism of therapeutic drugs (eg., (Hui and Liu 2015;Bairam et al. 2019)), and these genes have therefore been studied extensively in the framework of medico-and pharmacogenetics (Thomae et al. 2004;Hildebrandt et al. 2004).In the dataset analyzed, it has reduced mean copy number in Asia but not in Europe.The reduced mean (from 7.6 in YRI to 7.0 in CHB) is a significant difference, which can not be explained with a simple bottleneck scenario with recombination rate higher than 0.002 (see Table 4).If one considers a change of selection as in scenarios V and VI, we expect a stronger selective pressure (rising from s x = 0.03 to s x = 0.05 for r = 0.002) in CHB.There have been a few studies on copy numbers of SULT1A3/4 genes.Hildebrandt et al. (2004) first noted possible duplication of SULT1A3 and identified a duplicated copy in all four different human populations.More recently, a study of 172 human individuals discovered variable SULT1A3/4 copy numbers from 1 to 10, and associated its copy number with risk and onset of neurodegenerative disease (Butcher et al. 2017).Note that SULT1A3 and SULT1A4 are closely related paralogs that are often difficult to distinguish, and studies on copy number usually put them together.
PGA3 (Pepsinogen, precursor for pepsin, an enzyme that breaks down protein to smaller peptides) is associated with prostatespecific antigen production.It is the only gene in our list to have opposite changes in two derived populations: its mean copy number increases in Asia and decreases in Europe.As Asian and European humans share most of the same bottleneck period, the diverging copy number distribution is highly unlikely to be a demographic effect, and complex selection patterns are needed to explain the data.Indeed, the bottleneck simulations shown in Table 4 and 3 we observe a small increase of s x in Asia compared to Africa but a strong decrease of both s x and s y in Europe to cope the increased variance in copy number in CEU.The copy number variation on the Pepsinogen (PGA) locus was originally discoverd with electrophoresis and three individual genes (named PGA 3, 4, 5) were initially found (Taggart et al. 1985).Pepsinogen genes have been shown to duplicate and become lost recurrently in vertebrates (Castro et al. 2014).The pepsinogen genes were also shown to have variable expression level in tumor cells, particularly a reduction of PGA expression in esphagael, stomach and thyroid cancers (Shen et al. 2020).This could be an additional source of selective pressure besides protein metabolism.While the simplest explanation is that dietary differences between Asian and European populations during the spread of of agriculture (in the last 5000-10000 years) is the driver of PGA copy number changes, alternative hypotheses involving tumor suppression or interaction with other enzymes must be considered.
Finally, we analyzed the immune gene Alpha-defensin DEFA1.It codes for defensins, proteins that are involved in innate (nonlearned) immunity, specifically in antimicrobial defense against a broad spectrum of microorganisms, including bacteria, fungi, and viruses.DEFA1 shows a decrease in variance in both Asia and Europe, indicating stronger selective pressures.More precisely, when considering the distribution in Fig 1, one observes four individuals in YRI population with high copy number which indicates a relaxed selective pressure in Africa.With equation ( 3) we find selection coefficients 10-fold smaller in Africa than in Europe and Asia (see Table 3).Alpha-defensins are expressed in neutrophil cells and intestinal epithelial cells, acting as microbiocidal agents (Ganz et al. 1985;Ayabe et al. 2000;Nassar et al. 2007).The genes DEFA1 and DEFA3 code for some of the Alpha-defensins (HNP1/2/3), and appear to be "interchangeable variant cassettes" within a tandem array of 19kb (Aldred et al. 2005).Copy number variation of DEFA1 is present in all apes including gibbon, but the version identified as DEFA3 is human-specific; the copy number is also demonstrated to affect expression level (Aldred et al. 2005).Low copy number of DEFA1/3 is shown to be associated with hospital-acquired infection (Zhao et al. 2018) as well as kidney diseases (Ai et al. 2016).On the other hand and counterintuitively, a high copy number of DEFA1/3 may lead to more severe cases of sepsis (Chen et al. 2010(Chen et al. , 2019) ) and is associated with Crohn's Disease (Jespersgaard et al. 2011), and thus selected against.The trade-off between infective and autoimmune diseases could lead to selection towards an intermediate copy number of Alpha-defensins.Therefore, our results suggest a possibility that the out-of-Africa expansion is accompanied by such a change in environmental pathogen diversity that a delicately tuned dosage of defensin is required.This can be corroborated by the fact that YRI has a few individuals with very high (outliers) copy numbers of DEFA1, which can not be found in CHB or CEU.
In conclusion, while both demographic effects and shifts in selection schemes can result in changes in copy number distributions, in some of our candidate genes the former is not sufficient to explain the observation.Adaptive processes can induce new relationships between copy number and fitness, and impact the resulting copy number distribution.Importantly, changes in the strength, or direction of selection may become manifest not only in mean copy number, but also in the variance or compound statistics, such as the coefficient of variation.

PGA3
the copy number distributions of four of them are shown in Fig 1.

Figure 3 Figure 4
Figure 3 Illustration of simulated demography scenarios.The height of the blue boxes denote the population size.Scenario I-III cover a simple bottleneck lasting 5,000 generations, whereas scenario IV-VI are modifications of the GADMA-model with migration between subpopulations and change of selection parameters (indicated by red stars) in scenario V and VI.The gray color in the YRI population indicates, that it is at equilibrium.

Figure 5
Figure 5 Comparison of mean copy number ȳ, and coefficient of variation C V = σ/ ȳ in four scenarios I,IV,V,VI for candidate genes AMY1A, SULT1A3, PGA3, DEFA1.In scenario V and VI the initial selection coefficients of YRI (see Table 3) were changed 500 resp.986 generations before present.Simulation results are shown for lowest and highest recombination rate.Black lines refer to mean and C V of the experimental data in YRI, blue to CEU and red to CHB.
the simulations V and VI with a change of selection parameters as shown in Fig 5 support this hypothesis.When considering the estimates of Table

Table 1 Differences in mean and variance between populations. 0 indicates no significant change, '+' a significant increase and '-' a significant decrease (t-test for the mean and
F-test for the standard deviation; α = 0.05).

Figure 2 A. Sketch of the unequal recombination process. Starting with two chromosomes with y 1 = 5 and y 2 = 4 gene copies, two break points are chosen. One of the recombinants is then propagated. Its copy number (here y = 6) is Trapezoidal, as shown in (Otto et al. 2022). B. Example of the fitness function ω
(y) (equation (1)) with ε = 0.05, s x = 0.05, s y = 0.0025,

Table 3
Estimates of selection coefficients s x , s y under four recombination rates r = 0.001, ..., 0.01 based on regression equation (3).The

displayed gene families are the ones of Fig 1 for all three populations. Values in parentheses are out of the range
0.001 < s x < 0.1 in